1,456 research outputs found
Incentivizing Exploration with Heterogeneous Value of Money
Recently, Frazier et al. proposed a natural model for crowdsourced
exploration of different a priori unknown options: a principal is interested in
the long-term welfare of a population of agents who arrive one by one in a
multi-armed bandit setting. However, each agent is myopic, so in order to
incentivize him to explore options with better long-term prospects, the
principal must offer the agent money. Frazier et al. showed that a simple class
of policies called time-expanded are optimal in the worst case, and
characterized their budget-reward tradeoff.
The previous work assumed that all agents are equally and uniformly
susceptible to financial incentives. In reality, agents may have different
utility for money. We therefore extend the model of Frazier et al. to allow
agents that have heterogeneous and non-linear utilities for money. The
principal is informed of the agent's tradeoff via a signal that could be more
or less informative.
Our main result is to show that a convex program can be used to derive a
signal-dependent time-expanded policy which achieves the best possible
Lagrangian reward in the worst case. The worst-case guarantee is matched by
so-called "Diamonds in the Rough" instances; the proof that the guarantees
match is based on showing that two different convex programs have the same
optimal solution for these specific instances. These results also extend to the
budgeted case as in Frazier et al. We also show that the optimal policy is
monotone with respect to information, i.e., the approximation ratio of the
optimal policy improves as the signals become more informative.Comment: WINE 201
On the Prior Sensitivity of Thompson Sampling
The empirically successful Thompson Sampling algorithm for stochastic bandits
has drawn much interest in understanding its theoretical properties. One
important benefit of the algorithm is that it allows domain knowledge to be
conveniently encoded as a prior distribution to balance exploration and
exploitation more effectively. While it is generally believed that the
algorithm's regret is low (high) when the prior is good (bad), little is known
about the exact dependence. In this paper, we fully characterize the
algorithm's worst-case dependence of regret on the choice of prior, focusing on
a special yet representative case. These results also provide insights into the
general sensitivity of the algorithm to the choice of priors. In particular,
with being the prior probability mass of the true reward-generating model,
we prove and regret upper bounds for the
bad- and good-prior cases, respectively, as well as \emph{matching} lower
bounds. Our proofs rely on the discovery of a fundamental property of Thompson
Sampling and make heavy use of martingale theory, both of which appear novel in
the literature, to the best of our knowledge.Comment: Appears in the 27th International Conference on Algorithmic Learning
Theory (ALT), 201
Bandit Models of Human Behavior: Reward Processing in Mental Disorders
Drawing an inspiration from behavioral studies of human decision making, we
propose here a general parametric framework for multi-armed bandit problem,
which extends the standard Thompson Sampling approach to incorporate reward
processing biases associated with several neurological and psychiatric
conditions, including Parkinson's and Alzheimer's diseases,
attention-deficit/hyperactivity disorder (ADHD), addiction, and chronic pain.
We demonstrate empirically that the proposed parametric approach can often
outperform the baseline Thompson Sampling on a variety of datasets. Moreover,
from the behavioral modeling perspective, our parametric framework can be
viewed as a first step towards a unifying computational model capturing reward
processing abnormalities across multiple mental conditions.Comment: Conference on Artificial General Intelligence, AGI-1
Understanding the Spatial Clustering of Severe Acute Respiratory Syndrome (SARS) in Hong Kong
We applied cartographic and geostatistical methods in analyzing the patterns of disease spread during the 2003 severe acute respiratory syndrome (SARS) outbreak in Hong Kong using geographic information system (GIS) technology. We analyzed an integrated database that contained clinical and personal details on all 1,755 patients confirmed to have SARS from 15 February to 22 June 2003. Elementary mapping of disease occurrences in space and time simultaneously revealed the geographic extent of spread throughout the territory. Statistical surfaces created by the kernel method confirmed that SARS cases were highly clustered and identified distinct disease âhot spots.â Contextual analysis of mean and standard deviation of different density classes indicated that the period from day 1 (18 February) through day 16 (6 March) was the prodrome of the epidemic, whereas days 86 (15 May) to 106 (4 June) marked the declining phase of the outbreak. Origin-and-destination plots showed the directional bias and radius of spread of superspreading events. Integration of GIS technology into routine field epidemiologic surveillance can offer a real-time quantitative method for identifying and tracking the geospatial spread of infectious diseases, as our experience with SARS has demonstrated
Molecular Identification of Spirometra erinaceieuropaei Tapeworm in Cases of Human Sparganosis, Hong Kong
Human sparganosis is a foodborne zoonosis endemic in Asia. We report a series of 9 histologically confirmed human sparganosis cases in Hong Kong, China. All parasites were retrospectively identified as Spirometra erinaceieuropaei. Skin and soft tissue swelling was the most common symptom, followed by central nervous system lesions.published_or_final_versio
Treatment of severe acute respiratory syndrome with lopinavir/ritonavir: A multicentre retrospective matched cohort study
Objectives. To investigate the possible benefits and adverse effects of the addition of lopinavir/ritonavir to a standard treatment protocol for the treatment of severe acute respiratory syndrome. Design. Retrospective matched cohort study. Setting. Four acute regional hospitals in Hong Kong. Patients and methods. Seventy-five patients with severe acute respiratory syndrome treated with lopinavir/ritonavir in addition to a standard treatment protocol adopted by the Hospital Authority were matched with controls retrieved from the Hospital Authority severe acute respiratory syndrome central database. Matching was done with respect to age, sex, the presence of co-morbidities, lactate dehydrogenase level and the use of pulse steroid therapy. The 75 patients treated with lopinavir/ritonavir were divided into two subgroups for analysis: lopinavir/ritonavir as initial treatment, and lopinavir/ritonavir as rescue therapy. These groups were compared with matched cohorts of 634 and 343 patients, respectively. Outcomes including overall death rate, oxygen desaturation, intubation rate, and use of pulse methylprednisolone were reviewed. Results. The addition of lopinavir/ritonavir as initial treatment was associated with a reduction in the overall death rate (2.3%) and intubation rate (0%), when compared with a matched cohort who received standard treatment (15.6% and 11.0% respectively, P<0.05) and a lower rate of use of methylprednisolone at a lower mean dose. The subgroup who had received lopinavir/ritonavir as rescue therapy, showed no difference in overall death rate and rates of oxygen desaturation and intubation compared with the matched cohort, and received a higher mean dose of methylprednisolone. Conclusion. The addition of lopinavir/ritonavir to a standard treatment protocol as an initial treatment for severe acute respiratory syndrome appeared to be associated with improved clinical outcome. A randomised double-blind placebo-controlled trial is recommended during future epidemics to further evaluate this treatment.published_or_final_versio
Prospective modelling of environmental dynamics. A methodological comparison applied to mountain land cover changes
During the last 10 years, scientists performed significant advances in modelling environmental dynamics. A wide range of new methodological approaches in geomatics - such as neural networks, multi-agent systems or fuzzy logics - was developed. Despite these progresses, the modelling softwares available have to be considered as experimental tools rather than as improved procedures able to work for environmental management or decision support. Particularly, the authors consider that a large number of publications suffer from lakes in the validation of the model results. This contribution describes three different modelling approaches applied to prospective land cover prediction. The first one, a combined geomatic method, uses Markov chains for temporal transition prediction while their spatial assignment is supervised manually by the construction of suitability maps. Compared to this directed method, the two others may be considered as semi automatic because both the polychotomous regression and the multilayer perceptron only need to be optimized during a training step - the algorithms detect themselves the spatial-temporal changes in land cover. The authors describe the three methodological approaches and their practical applications to two mountain studied areas: one in French Pyrenees, the second including a large part of Sierra Nevada, Spain. The article focuses on the comparison of results. The main result is that prediction scores are on the more high that land cover is persistent. They also underline that the geomatic model is complementary to the statistical ones which perform higher overall prediction rate but produce worse simulations when land cover changes are numerous
- âŠ